visual reasoning AI News List | Blockchain.News
AI News List

List of AI News about visual reasoning

Time Details
2025-11-26
11:09
Chain-of-Visual-Thought (COVT): Revolutionizing Visual Language Models with Continuous Visual Tokens for Enhanced Perception

According to @godofprompt, the new research paper 'Chain-of-Visual-Thought (COVT)' introduces a breakthrough method for Visual Language Models (VLMs) by enabling them to reason using continuous visual tokens instead of traditional text-based chains of thought. This approach allows models to generate mid-thought visual latents such as segmentation cues, depth maps, edges, and DINO features, effectively giving the model a 'visual scratchpad' for spatial and geometric reasoning. The results are significant: COVT models achieved a 14% improvement in depth reasoning, a 5.5% boost on CV-Bench, and major gains on HRBench and MMVP benchmarks. The technique is compatible with leading VLMs like Qwen2.5-VL and LLaVA, with interpretable visual tokens that can be decoded for transparency. Notably, the research finds that traditional text-only reasoning chains actually degrade visual reasoning performance, whereas COVT’s visual grounding enhances accuracy in counting, spatial understanding, 3D awareness, and reduces hallucinated outputs. These findings point to transformative business opportunities for AI solutions requiring fine-grained visual analysis, accurate object recognition, and reliable spatial intelligence, especially in fields like robotics, autonomous vehicles, and advanced multimodal search. (Source: @godofprompt, Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens, 2025)

Source
2025-10-07
19:45
Google DeepMind Launches Gemini 2.5: Advanced AI Model Sets New Benchmark for Automated Web Browsing

According to Google DeepMind, the new Gemini 2.5 Computer Use model leverages advanced visual understanding and reasoning to enable AI agents to navigate browsers by clicking, scrolling, and typing as a human user would. This upgrade significantly enhances practical AI applications for automated online tasks, streamlining workflows in industries such as customer support, e-commerce, and data entry. The model outperforms previous versions on multiple industry benchmarks, offering improved speed and reliability, which positions it as a game-changer for businesses seeking to automate complex web-based operations (source: Google DeepMind, Twitter, Oct 7, 2025).

Source
2025-06-11
17:00
Meta Unveils V-JEPA-v2: Advanced Self-Supervised Vision AI Model for Business Applications

According to Yann LeCun (@ylecun), Meta has released V-JEPA-v2, a new version of its self-supervised vision model designed to significantly improve visual reasoning and understanding without reliance on labeled data (source: @ylecun, June 11, 2025). V-JEPA-v2 leverages joint embedding predictive architecture, enabling more efficient training and better generalization across varied visual tasks. This breakthrough is expected to drive business opportunities in industries such as autonomous vehicles, retail analytics, and healthcare imaging by lowering data annotation costs and accelerating deployment of AI-powered vision systems.

Source